{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Python API\n",
    "\n",
    "HyperTS是DataCanvas Automatic Toolkit(DAT)工具链中，依托[Hypernets](https://github.com/DataCanvasIO/Hypernets)研发的关于时间序列的全Pipeline的自动化工具包。它遵循了make_expriment的使用习惯(类似于[HyperGBM](https://github.com/DataCanvasIO/HyperGBM)的API，一个针对于结构化表格数据的AutoML工具)，也符合```scikit-learn```中model API的使用规范。我们可以创造一个```make_expriment```，```run```之后获得pipeline_model, 即一个最终优化完毕的estimator, 然后使用它的```predict```, ```evaluate```, ```plot```去分析未知的数据。\n",
    "\n",
    "接下来，我们一起动手快速看三个案例指南吧。\n",
    "\n",
    "\n",
    "1. 预测案例\n",
    "2. 分类案例\n",
    "3. 异常检测案例"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 预测案例"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 准备数据\n",
    "\n",
    "PS: 如果您对于HyperTS的数据形式存有疑惑，请回看[01_datatypes_for_hyperts.ipynb](https://github.com/DataCanvasIO/HyperTS/blob/main/examples/01_datatypes_for_hyperts.ipynb)中的介绍。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts.datasets import load_network_traffic\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在划分训练集和测试集中，由于数据存在时间上的先后顺序，为防止信息泄露，我们从整体数据集的后边切分一部分，故```shuffle=False```."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = load_network_traffic()\n",
    "train_data, test_data = train_test_split(df, test_size=168, shuffle=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在本案例数据集，我们暴露其一些基本信息供参考，具体如下：\n",
    "\n",
    "- 时间列名称: 'TimeStamp';\n",
    "- 目标列名称: ['Var_1', 'Var_2', 'Var_3', 'Var_4', 'Var_5', 'Var_6'];\n",
    "- 协变量列名称: ['HourSin', 'WeekCos', 'CBWD'];\n",
    "- 时间频率: 'H'."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 创建实验\n",
    "\n",
    "我们通过创建实验```make_experiment```搜索一个时序模型, 并调用```run()```方法来执行实验。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts import make_experiment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**注意：** \n",
    "\n",
    "在预测任务中，我们必须向```make_experiment```传入参数```timestamp```列名。如果存在协变量，也需要传入```covariables```列名。\n",
    "\n",
    "因此，在本案例中，我们需要向```make_experiment```传入以下参数：\n",
    "\n",
    "1.告知其我们现在需要做一个时序预测任务，即```task='forecast'```;\n",
    "\n",
    "2、告知其数据集的时间列名称，即```timestamp='TimeStamp'```；\n",
    "\n",
    "3、告知其数据集中协变量列的名称，即```covariables=['HourSin', 'WeekCos', 'CBWD']```;\n",
    "\n",
    "4、如果想要获得强大的性能表现，还可以修改其他默认的参数，具体可以参考参数介绍。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data=train_data.copy(),\n",
    "                             task='forecast',\n",
    "                             timestamp='TimeStamp',\n",
    "                             covariables=['HourSin', 'WeekCos', 'CBWD'])\n",
    "model = experiment.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "看一看获得最终搜索的模型的参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.get_pipeline_params()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 结果预测"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对test data切分X与y, 调用```predict()```方法执行结果预测。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test, y_test = model.split_X_y(test_data.copy())\n",
    "forecast = model.predict(X_test)\n",
    "forecast.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 结果评估"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "调用```evaluate```方法执行结果评估，便可以观测到各个评估指标下的得分情况。\n",
    "\n",
    "这里会返回一些默认的指标评分，如果想观测指定指标的评分，可以设置参数```metrics```， 例如metrics=['mae', 'mse', mape_func]。\n",
    "\n",
    "其中，mape_func可以是自定义的评估函数或者来自于sklearn的评估函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = model.evaluate(y_true=y_test, y_pred=forecast)\n",
    "results.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.5 可视化"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "调用```plot()```方法可视化，看一看预测曲线，并与实际的曲线对比一下。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.plot(forecast=forecast, actual=test_data, var_id='Var_3', interactive=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**注意**\n",
    "\n",
    "- 这里会显示某一个变量的预测曲线，默认为第一个目标变量；\n",
    "- 如果为多变量预测，想要观测其他的变量曲线变化的情况，可以修改参数```var_id```, 例如：var_id=2 或者var_id='Var_2'；\n",
    "- ```plot```可支持交互式可视化通过设置```interactive=False```(默认交互, 需安装plotly);\n",
    "- 绘制更长期的历史信息，设置参数```history=sub_train_data```."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.6 模型保存"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "直接采用``save()``方式."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.save(model_file=\"./xxx/xxx/models\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.7 模型加载"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts.utils.models import load_model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline_model = load_model(model_file=\"./xxx/xxx/models/stats_models\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**注意**\n",
    "\n",
    "加载模型时，需要指定模型保存的文件夹，即\"./xxx/xxx/models/stats_models\" 而不是\"./xxx/xxx/models\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<bar>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<bar>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 分类案例"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 准备数据集 \n",
    "\n",
    "PS: 如果您对于HyperTS的数据形式存有疑惑，请回看[01_datatypes_for_hyperts.ipynb](https://github.com/DataCanvasIO/HyperTS/blob/main/examples/01_datatypes_for_hyperts.ipynb)中的介绍。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts.datasets import load_basic_motions\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = load_basic_motions()\n",
    "train_df, test_df = train_test_split(df, test_size=0.2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对本案例数据集，为了方便理解，我们暴露一些基本的信息如下：\n",
    "\n",
    "- 特征列名称: ['Var_1', 'Var_2', 'Var_3', 'Var_4', 'Var_5', 'Var_6'];\n",
    "- 目标列名称: 'target'."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 创建实验\n",
    "\n",
    "我们通过创建实验```make_experiment```搜索一个时序模型, 并调用```run()```方法来执行实验。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data=train_df.copy(), task='classification', target='target')\n",
    "\n",
    "model = experiment.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "看一看获得最终搜索的模型的参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.get_pipeline_params()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 结果预测"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对test data切分X与y, 调用```predict()```方法执行结果预测。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test, y_test = model.split_X_y(test_df.copy())\n",
    "y_pred = model.predict(X_test)\n",
    "y_proba = model.predict_proba(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 结果评估"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "调用```evaluate```方法执行结果评估，便可以观测到各个评估指标下的得分情况。\n",
    "\n",
    "\n",
    "这里会返回一些默认的指标评分，如果想观测指定指标的评分，可以设置参数```metrics```， 例如metrics=['accuracy', 'auc', f1_func]。\n",
    "\n",
    "其中，mape_func可以是自定义的评估函数或者来自于sklearn的评估函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = model.evaluate(y_true=y_test, y_pred=y_pred, y_proba=y_proba)\n",
    "results.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5 模型保存"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "调用``save_model()``保存模型."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"\" width = \"500\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts.utils.models import save_model\n",
    "\n",
    "save_model(model=model, model_file=\"./xxx/xxx/models\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 异常检测案例"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 准备数据集\n",
    "\n",
    "PS: 如果您对于HyperTS的数据形式存有疑惑，请回看[01_datatypes_for_hyperts.ipynb](https://github.com/DataCanvasIO/HyperTS/blob/main/examples/01_datatypes_for_hyperts.ipynb)中的介绍。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts.datasets import load_real_known_cause_dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = load_real_known_cause_dataset()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通常情况下，我们在实际场景中所获得的数据是无标注的。因此，我们将数据中的真实标签分离出来。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ground_truth = df.pop('anomaly')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "拆分训练集和测试集。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts.toolbox import temporal_train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_data, test_data = temporal_train_test_split(df, test_horizon=15000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 创建实验\n",
    "\n",
    "我们通过创建实验```make_experiment```搜索一个时序模型, 并调用```run()```方法来执行实验。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts import make_experiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(\n",
    "    train_data=train_data.copy(),\n",
    "    mode='dl',\n",
    "    task='detection',\n",
    "    timestamp='timestamp',\n",
    "    max_trials=30,\n",
    "    random_state=2022)\n",
    "\n",
    "model = experiment.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "看一看获得最终搜索的模型的参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.get_pipeline_params()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 结果预测"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对test data切分X与y, 调用```predict()```和```predict_proba()```方法执行结果预测。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test, _ = model.split_X_y(test_data.copy())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred = model.predict(X_test)\n",
    "y_proba = model.predict_proba(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4 结果评估\n",
    "\n",
    "\n",
    "调用```evaluate```方法执行结果评估，便可以观测到各个评估指标下的得分情况。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_test = ground_truth.iloc[-15000:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scores = model.evaluate(y_true=y_test, y_pred=y_pred, y_proba=y_proba)\n",
    "scores"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.5 可视化\n",
    "\n",
    "调用```plot()```方法可视化，看一看异常点的分布和各个时间点的异常程度。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.plot(y_pred, actual=test_data, history=train_data, interactive=False)"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "29dd11bf16d40fe19950dcc2f06dd773fb6bc4491ac296fb211bfed7a4a532da"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
